nlp_architect.pipelines.spacy_bist.SpacyBISTParser

class nlp_architect.pipelines.spacy_bist.SpacyBISTParser(verbose=False, spacy_model='en', bist_model=None)[source]

Main class which handles parsing with Spacy-BIST parser.

Parameters:
  • verbose (bool, optional) – Controls output verbosity.
  • spacy_model (str, optional) – Spacy model to use
  • https ((see) – //spacy.io/api/top-level#spacy.load).
  • bist_model (str, optional) – Path to a .model file to load. Defaults pre-trained model’.
__init__(verbose=False, spacy_model='en', bist_model=None)[source]

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__([verbose, spacy_model, bist_model]) Initialize self.
parse(doc_text[, show_tok, show_doc]) Parse a raw text document.
to_conll(doc_text) Converts a document to CoNLL format with spacy POS tags.

Attributes

dir
dir = PosixPath('/Users/pizsak/nlp-architect/cache/bist-pretrained')
parse(doc_text, show_tok=True, show_doc=True)[source]

Parse a raw text document.

Parameters:
  • doc_text (str) –
  • show_tok (bool, optional) – Specifies whether to include token text in output.
  • show_doc (bool, optional) – Specifies whether to include document text in output.
Returns:

The annotated document.

Return type:

CoreNLPDoc

to_conll(doc_text)[source]

Converts a document to CoNLL format with spacy POS tags.

Parameters:doc_text (str) – raw document text.
Yields:list of ConllEntry – The next sentence in the document in CoNLL format.